自我监督的视觉表示学习最近引起了重大的研究兴趣。虽然一种评估自我监督表示的常见方法是通过转移到各种下游任务,但我们研究了衡量其可解释性的问题,即了解原始表示中编码的语义。我们将后者提出为估计表示和手动标记概念空间之间的相互信息。为了量化这一点,我们介绍了一个解码瓶颈:必须通过简单的预测变量捕获信息,将概念映射到表示空间中的簇。我们称之为反向线性探测的方法为表示表示的语义敏感。该措施还能够检测出表示何时包含概念的组合(例如“红色苹果”),而不仅仅是单个属性(独立的“红色”和“苹果”)。最后,我们建议使用监督分类器自动标记大型数据集,以丰富用于探测的概念的空间。我们使用我们的方法来评估大量的自我监督表示形式,通过解释性对它们进行排名,并通过线性探针与标准评估相比出现的差异,并讨论了一些定性的见解。代码为:{\ Scriptsize {\ url {https://github.com/iro-cp/ssl-qrp}}}}}。
translated by 谷歌翻译
因果表示学习是识别基本因果变量及其从高维观察(例如图像)中的关系的任务。最近的工作表明,可以从观测的时间序列中重建因果变量,假设它们之间没有瞬时因果关系。但是,在实际应用中,我们的测量或帧速率可能比许多因果效应要慢。这有效地产生了“瞬时”效果,并使以前的可识别性结果无效。为了解决这个问题,我们提出了ICITRI,这是一种因果表示学习方法,当具有已知干预目标的完美干预措施时,可以在时间序列中处理瞬时效应。 Icitris从时间观察中识别因果因素,同时使用可区分的因果发现方法来学习其因果图。在三个视频数据集的实验中,Icitris准确地识别了因果因素及其因果图。
translated by 谷歌翻译
自我监督学习的进步带来了强大的一般图像表示学习方法。到目前为止,它主要集中在图像级学习上。反过来,诸如无监督图像细分之类的任务并没有从这种趋势中受益,因为它们需要空间多样性的表示。但是,学习密集的表示具有挑战性,因为在无监督的环境中,尚不清楚如何指导模型学习与各种潜在对象类别相对应的表示形式。在本文中,我们认为对物体部分的自我监督学习是解决此问题的方法。对象部分是可以推广的:它们是独立于对象定义的先验性,但可以分组以形成对象后验。为此,我们利用最近提出的视觉变压器参与对象的能力,并将其与空间密集的聚类任务相结合,以微调空间令牌。我们的方法超过了三个语义分割基准的最新方法,提高了17%-3%,表明我们的表示在各种对象定义下都是用途广泛的。最后,我们将其扩展到完全无监督的分割 - 即使在测试时间也可以完全避免使用标签信息 - 并证明了一种基于社区检测的自动合并发现的对象零件的简单方法可产生可观的收益。
translated by 谷歌翻译
从视觉观察中了解动态系统的潜在因果因素被认为是对复杂环境中推理的推理的关键步骤。在本文中,我们提出了Citris,这是一种变异自动编码器框架,从图像的时间序列中学习因果表示,其中潜在的因果因素可能已被干预。与最近的文献相反,Citris利用了时间性和观察干预目标,以鉴定标量和多维因果因素,例如3D旋转角度。此外,通过引入归一化流,可以轻松扩展柑橘,以利用和删除已验证的自动编码器获得的删除表示形式。在标量因果因素上扩展了先前的结果,我们在更一般的环境中证明了可识别性,其中仅因果因素的某些成分受干预措施影响。在对3D渲染图像序列的实验中,柑橘类似于恢复基本因果变量的先前方法。此外,使用预验证的自动编码器,Citris甚至可以概括为因果因素的实例化,从而在SIM到现实的概括中开放了未来的研究领域,以进行因果关系学习。
translated by 谷歌翻译
神经网络可以从单个图像中了解视觉世界的内容是什么?虽然它显然不能包含存在的可能对象,场景和照明条件 - 在所有可能的256 ^(3x224x224)224尺寸的方形图像中,它仍然可以在自然图像之前提供强大的。为了分析这一假设,我们通过通过监控掠夺教师的知识蒸馏来制定一种训练神经网络的培训神经网络。有了这个,我们发现上述问题的答案是:“令人惊讶的是,很多”。在定量术语中,我们在CiFar-10/100上找到了94%/ 74%的前1个精度,在想象中,通过将这种方法扩展到音频,84%的语音组合。在广泛的分析中,我们解除了增强,源图像和网络架构的选择,以及在从未见过熊猫的网络中发现“熊猫神经元”。这项工作表明,一个图像可用于推断成千上万的对象类,并激励关于增强和图像的基本相互作用的更新的研究议程。
translated by 谷歌翻译
我们在没有监督的情况下解决了学习对象探测器的问题。与弱监督的对象检测不同,我们不假设图像级类标签。取而代之的是,我们使用音频组件来“教”对象检测器,从视听数据中提取监督信号。尽管此问题与声音源本地化有关,但它更难,因为检测器必须按类型对对象进行分类,列举对象的每个实例,并且即使对象保持沉默,也可以这样做。我们通过首先设计一个自制的框架来解决这个问题,该框架具有一个对比目标,该目标共同学会了分类和本地化对象。然后,在不使用任何监督的情况下,我们只需使用这些自我监督的标签和盒子来训练基于图像的对象检测器。因此,对于对象检测和声音源定位的任务,我们优于先前的无监督和弱监督的检测器。我们还表明,我们可以将该探测器与每个伪级标签的标签保持一致,并展示我们的方法如何学习检测超出仪器(例如飞机和猫)的通用对象。
translated by 谷歌翻译
Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks. However, doing so naively leads to ill posed learning problems with degenerate solutions. In this paper, we propose a novel and principled learning formulation that addresses these issues. The method is obtained by maximizing the information between labels and input data indices. We show that this criterion extends standard crossentropy minimization to an optimal transport problem, which we solve efficiently for millions of input images and thousands of labels using a fast variant of the Sinkhorn-Knopp algorithm. The resulting method is able to self-label visual data so as to train highly competitive image representations without manual labels. Our method achieves state of the art representation learning performance for AlexNet and ResNet-50 on SVHN, CIFAR-10, CIFAR-100 and ImageNet and yields the first self-supervised AlexNet that outperforms the supervised Pascal VOC detection baseline. Code and models are available 1 .
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
We present the interpretable meta neural ordinary differential equation (iMODE) method to rapidly learn generalizable (i.e., not parameter-specific) dynamics from trajectories of multiple dynamical systems that vary in their physical parameters. The iMODE method learns meta-knowledge, the functional variations of the force field of dynamical system instances without knowing the physical parameters, by adopting a bi-level optimization framework: an outer level capturing the common force field form among studied dynamical system instances and an inner level adapting to individual system instances. A priori physical knowledge can be conveniently embedded in the neural network architecture as inductive bias, such as conservative force field and Euclidean symmetry. With the learned meta-knowledge, iMODE can model an unseen system within seconds, and inversely reveal knowledge on the physical parameters of a system, or as a Neural Gauge to "measure" the physical parameters of an unseen system with observed trajectories. We test the validity of the iMODE method on bistable, double pendulum, Van der Pol, Slinky, and reaction-diffusion systems.
translated by 谷歌翻译
While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners and controls, thereby proposing a method to calculate directional connectivity. As causal relationships run in both directions, we explore three scenarios, namely channels' activity as sources, as sinks, and in total. Our proposed method can be used for both classification and exploratory analysis. In all scenarios, we find confirmation of the established right-lateralized Theta sampling network anomaly, in line with the temporal sampling framework's assumption of oscillatory differences in the Theta and Gamma bands. Further, we show that this anomaly primarily occurs in the causal relationships of channels acting as sinks, where it is significantly more pronounced than when only total activity is observed. In the sink scenario, our classifier obtains 0.84 and 0.88 accuracy and 0.87 and 0.93 AUC for the Theta and Gamma bands, respectively.
translated by 谷歌翻译